Minimum Number of Deletions and Insertions

Statement#

Given two strings, str1 and str2, find the minimum number of deletions and insertions required to transform str1 into str2.

Note: This problem has a direct application in the autocorrect feature. It is also used in bioinformatics to quantify the similarity between two DNA sequences.

For example, if we want to transform str1 “pqr” into str2 “tqr”, we need to delete “p” and insert “t” in str1. Therefore, the minimum number of deletions and insertions required are:

  • Deletions: 11
  • Insertions: 11

Constraints:

  • 11 \leq str1.length, str2.length 500\leq 500
  • str1 and str2 are in the same case English letters.

Examples#

No.

str1

str2

Deletions

Insertions

1

heap

pea

2

1

2

passport

ppsspt

3

1

3

bed

read

1

2

Try it yourself#

Implement your solution in the following playground.

Python
usercode > main.py
Input #1
Input #2
Minimum Number of Deletions and Insertions

Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized.

Hint: Use dynamic programming and see the magic.

Solution#

We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.

Naive approach#

A naive approach is to process all the characters of each string, one by one. Starting from the first character of both strings, there are three possibilities for every pair of characters being considered:

  • The first characters of both the strings match, in which case, we increment the count of the matching characters by 1. We move one position ahead in both strings, and the function is called recursively on the remaining strings.

  • The first characters of both the strings do not match, in which case, the following possibilities are checked on both strings:

    • A recursive call by moving one position ahead in the str1
    • A recursive call by moving one position ahead in the str2
    • Return the maximum of the matching characters found by both the calls
  • If we reach the end of either of the two strings, we return 00.

After finding the n (maximum number of the matching characters subsequence), we can find the number of deletions required on str1 to transform it into str2 by subtracting n from the length of str1. The number of insertions required on str1 to transform it into str2 can be found by subtracting n from the length of str2.

Let’s implement the algorithm as discussed above:

Minimum Number of Deletions and Insertions using recursion

Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.

Time complexity#

The time complexity of the naive approach is exponential, i.e., O(2n)O(2^n), where nn is the length of the longest string.

Space complexity#

As the maximum depth of the recursive call tree is nn, and each call stores its data on the stack, the space complexity of this approach is O(n)O(n), where nn is the length of the longest string.

Optimized Solution using dynamic programming#

Now, let us improve the recursive solution using top-down and bottom-up approaches.

Top-down solution#

The top-down solution, commonly known as the memoization technique, is an enhancement of the recursive solution. It overcomes the problem of calculating redundant solutions over and over again by storing them in a table. In the naive approach, it computes the maximum number of matching characters subsequence many times. Therefore, we need to memoize by using a 2-D table, lookup_table. This table is initialized using the lengths of both strings in the min_del_ins function.

The basic algorithm works exactly the same as the one in the naive approach. The only difference is that for each character, the value of the maximum number of matching characters is stored in the lookup_table.

  • If the start characters match, the value returned is stored at lookup_table[i][j].

  • If the start characters don’t match, in which case, the following possibilities are checked on both strings:

    • A recursive call by moving one position ahead in the str1
    • A recursive call by moving one position ahead in the str2
    • Store the maximum of the matching characters found by both the calls in the lookup_table[i][j] and return it

Realize that since the values returned are being stored in the lookup_table, the next time the function is called with the same arguments, the recursive call won’t be made. Instead, the value will be directly returned from the lookup_table.

Let’s implement the algorithm as discussed above:

Minimum Number of Deletions and Insertions using memoization

Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.

Time complexity#

As we avoided redundant calculations by storing all the intermediate results in a 2-D table, the time complexity of this approach is reduced to O(n2)O(n ^2), where nn is the length of the longest string.

Space complexity#

We now need O(m×n)O(m \times n) space in memory to store intermediate values.

Bottom-up solution#

The bottom-up solution, also known as the tabulation technique, is an iterative method of solving dynamic programming problems. In this solution, we will again make use of a 2-D table, lookup_table, that stores the maximum number of matching characters subsequence required to find the minimum number of deletions and insertions for converting str1 into str2.

In this case, we build the solution bottom-up by dividing our problem into subproblems.

  • We first make a 2-D table lookup_table[m+1][n+1] where mm is the length of str1 and nn is the length of str2. This table is initialized with 00s. We need the first row and column to be 00 for the base case. Any entry in this table given by lookup_table[i][j] is the maximum number of matching characters subsequence between str1 up till ithi^{th} position and str2 up to the jthj^{th} position.

  • If the characters match, we store the returned values from lookup_table[i-1][j-1] in lookup_table[i][j].

  • For characters that don’t match, we need to take a maximum of two subproblems:

    • Move one position ahead in str1 (the subproblem lookup_table[i-1][j])
    • Move one position ahead in str2 (the subproblem lookup_table[i][j-1])
  • In the end, we have the optimal answer for the maximum number of matching characters subsequence for str1 and str2 in the last position, i.e., lookup_table[m][n].

With this, we can easily calculate the minimum number of deletions and insertions required to transform str1 into str2

Let’s look at the following illustration to get a better understanding of the solution:

Created with Fabric.js 3.6.6 We make the table, lookup_table, of size (4x5) for str1 "bed" and str2 "read".

1 of 16

Created with Fabric.js 3.6.6 For the base case, we intialize the first row and column with zeros.

2 of 16

Created with Fabric.js 3.6.6 Index (1, 1): Since the characters do not match, we take the max of the position to the left, and the position above.

3 of 16

Created with Fabric.js 3.6.6 Index (1, 2): Since the characters do not match, we take the max of the position to the left, and the position above.

4 of 16

Created with Fabric.js 3.6.6 Index (1, 3): Since the characters do not match, we take the max of the position to the left, and the position above.

5 of 16

Created with Fabric.js 3.6.6 Index (1, 4): Since the characters do not match, we take the max of the position to the left, and the position above.

6 of 16

Created with Fabric.js 3.6.6 Index (2, 1): Since the characters do not match, we take the max of the position to the left, and the position above.

7 of 16

Created with Fabric.js 3.6.6 Index (2, 2): Since the characters match, we will add 1 to the previous diagonal.

8 of 16

Created with Fabric.js 3.6.6 Index (2, 3): Since the characters do not match, we take the max of the position to the left, and the position above.

9 of 16

Created with Fabric.js 3.6.6 Index (2, 4): Since the characters do not match, we take the max of the position to the left, and the position above.

10 of 16

Created with Fabric.js 3.6.6 Index (3, 1): Since the characters do not match, we take the max of the position to the left, and the position above.

11 of 16

Created with Fabric.js 3.6.6 Index (3, 2): Since the characters do not match, we take the max of the position to the left, and the position above.

12 of 16

Created with Fabric.js 3.6.6 Index (3, 3): Since the characters do not match, we take the max of the position to the left, and the position above.

13 of 16

Created with Fabric.js 3.6.6 Index (3, 4): Since the characters match, we will add 1 to the previous diagonal.

14 of 16

Created with Fabric.js 3.6.6 We will return the last value of the table i.e., 2, representing the maximum number of matching characters subsequence

15 of 16

Created with Fabric.js 3.6.6 deletions = length of str1 - last value of table = 3 - 2 = 1insertions = length of str2 - last value of table = 4 - 2 = 2

16 of 16

Minimum Number of Deletions and Insertions using tabulation

Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.

Time complexity#

As we created a 2-D table to store the results of subproblems that would be used later on. Therefore, the time complexity of this approach will be O(n2)O(n^2).

Space complexity#

We need O(m×n)O(m \times n) space in memory to store the intermediate values.

Shortest Common Supersequence

Edit Distance Problem